Skip to content

fix(go-api): sync document handler interface and enforce preview acce…#15688

Open
Hz-186 wants to merge 2 commits into
infiniflow:mainfrom
Hz-186:fix-go-api-document-handlers
Open

fix(go-api): sync document handler interface and enforce preview acce…#15688
Hz-186 wants to merge 2 commits into
infiniflow:mainfrom
Hz-186:fix-go-api-document-handlers

Conversation

@Hz-186
Copy link
Copy Markdown
Contributor

@Hz-186 Hz-186 commented Jun 5, 2026

Description

This PR syncs the documentServiceIface interface and introduces handler methods for document preview, artifact fetching, and downloading in the Go API. It also ensures that strict dataset alignment and access checks are enforced when retrieving or downloading documents.

Furthermore, this PR introduces comprehensive unit tests for both the newly added Handler and Service methods to ensure robustness and prevent future regressions.

Key Changes

  • Router & Handler Integration:
    • Added and wired new API endpoints in internal/router/router.go.
    • Synchronized the documentServiceIface with GetDocumentArtifact, GetDocumentPreview, and DownloadDocument.
    • Implemented handlers for these endpoints in internal/handler/document.go.
  • Access & Validation Enforcement:
    • Refactored internal/service/document.go to strictly check if a document belongs to the requested dataset before allowing downloads or previews.
    • Added robust artifact file sanitization (sanitizeArtifactFilename) and attachment handling (shouldForceArtifactAttachment).
  • Comprehensive Unit Testing:
    • Handler Layer (internal/handler/document_test.go): Added mock service implementations and Gin router tests covering success, not-found, and internal error states for all 3 new endpoints.
    • Service Layer (internal/service/document_test.go): Added extensive business logic tests including dataset mismatch checks, non-existent document checks, and artifact file validation.

Testing

All unit tests have been executed locally and passed successfully:

  • Tested TestGetDocumentArtifact_* (Success, NotFound, UnexpectedError, InvalidFilename, InvalidFileType)
  • Tested TestGetDocumentPreview_* (Success, NotFound)
  • Tested TestDownloadDocument_* (Success, MissingDocID, WrongDataset)
  • Tested TestArtifactHelpers (Sanitize logic & content type assertions)

@dosubot dosubot Bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Jun 5, 2026
@Hz-186 Hz-186 marked this pull request as draft June 5, 2026 04:35
@Hz-186 Hz-186 marked this pull request as ready for review June 5, 2026 04:36
@Hz-186 Hz-186 marked this pull request as draft June 5, 2026 04:37
@Hz-186 Hz-186 marked this pull request as ready for review June 5, 2026 04:37
@Hz-186 Hz-186 added the ci Continue Integration label Jun 5, 2026
@Hz-186 Hz-186 marked this pull request as draft June 5, 2026 04:37
@Hz-186 Hz-186 marked this pull request as ready for review June 5, 2026 04:37
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 5, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR adds three document serving endpoints to the HTTP API. GetDocumentArtifact and GetDocumentPreview serve sandbox artifacts and document previews from object storage with validated filenames and content-types. DownloadDocument retrieves document binaries linked to specific datasets. All endpoints set appropriate Content-Type and Content-Disposition headers.

Changes

Document Serving Endpoints

Layer / File(s) Summary
Service foundation: types, errors, and interface
internal/service/document.go, internal/handler/document.go
Imports for storage/utility logic, artifact response types, three new error sentinels (ErrArtifactNotFound, ErrArtifactInvalidFilename, ErrArtifactInvalidFileType), extension/content-type allowlists, and service interface extended with GetDocumentArtifact, GetDocumentPreview, DownloadDocument.
GetDocumentArtifact service and tests
internal/service/document.go, internal/service/document_test.go
Validates artifact filename as safe basename, checks against whitelisted extensions, fetches from sandbox storage, sanitizes response filename, and sets force-attachment flag; tests cover invalid filename/type errors and helper function behavior.
GetDocumentPreview and storage resolution
internal/service/document.go, internal/service/document_test.go
GetDocumentStorageAddress resolves storage bucket/key by consulting file-to-document mappings or document-level attributes; GetDocumentPreview loads document, resolves storage, fetches preview bytes, computes content-type from document type and extension; test validates missing document error.
DownloadDocument service and tests
internal/service/document.go, internal/service/document_test.go
Validates datasetID and docID, ensures document belongs to dataset, resolves storage address, rejects empty payloads, and returns bytes with application/octet-stream content-type; tests validate docID and dataset membership validation.
HTTP handlers, fake service, and handler tests
internal/handler/document.go, internal/handler/document_test.go
Three new handlers invoke corresponding service methods with parameter validation and error mapping; set Content-Type and Content-Disposition (inline vs attachment); fake service provides conditional responses for success, not-found, and error cases; test suite validates HTTP status, headers, and response body for all three endpoints.
Route registration
internal/router/router.go
Adds GET /api/v1/documents/:filename (artifact) and /:id (preview) routes, plus GET /api/v1/datasets/:dataset_id/documents/:document_id download route under authenticated middleware.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • infiniflow/ragflow#14011: Adds related document/file download endpoints that share filename/content-type "force attachment" and MIME utility logic for setting Content-Disposition headers conditionally.

Suggested labels

lgtm

Suggested reviewers

  • Haruko386

Poem

🐰 Hops with glee through storage bins,
Artifacts dance, previews spin,
Downloads streaming, headers right,
Three new endpoints, pure delight!
Safe basenames keep villains out,
Of that, there is no doubt!

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title is truncated/incomplete and uses vague phrasing ('enforce preview acce…'), making it unclear what the full scope of changes is. Complete the title with clear, specific language describing the main changes, e.g., 'Add document artifact, preview, and download handlers with service interface sync and access validation'.
✅ Passed checks (3 passed)
Check name Status Explanation
Description check ✅ Passed The PR description is comprehensive, well-organized, and covers all required template sections including problem statement, type of change, key changes, and testing.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
internal/handler/document_test.go (2)

480-539: ⚡ Quick win

Add tests for Content-Disposition headers and parameter validation.

The test coverage for GetDocumentArtifact is missing:

  1. Validation of the Content-Disposition header. According to the PR objectives, there's attachment handling logic (shouldForceArtifactAttachment). The success test should verify that when ForceAttachment is false, the header is set to inline; filename="safe.txt", and there should be a test case where ForceAttachment is true to verify the attachment disposition.
  2. Edge case tests for parameter validation (e.g., empty or missing filename parameter).
📋 Suggested test additions

Add a test to verify the Content-Disposition header in the success case:

 func TestGetDocumentArtifact_Success(t *testing.T) {
 	gin.SetMode(gin.TestMode)
 	h := &DocumentHandler{
 		documentService: &fakeDocumentService{},
 	}
 	c, w := setupGinContextWithUser("GET", "/api/v1/documents/artifact/test.txt", "")
 	c.Params = gin.Params{{Key: "filename", Value: "test.txt"}}
 
 	h.GetDocumentArtifact(c)
 
 	if w.Code != http.StatusOK {
 		t.Fatalf("expected 200, got %d", w.Code)
 	}
 	if w.Header().Get("Content-Type") != "text/plain" {
 		t.Fatalf("unexpected content type: %s", w.Header().Get("Content-Type"))
 	}
+	if contentDisp := w.Header().Get("Content-Disposition"); contentDisp != "inline; filename=\"safe.txt\"" {
+		t.Fatalf("unexpected Content-Disposition: %s", contentDisp)
+	}
 	if w.Body.String() != "artifact content" {
 		t.Fatalf("unexpected body: %s", w.Body.String())
 	}
 }

Add a test for the attachment disposition case by extending the fake service to return a response with ForceAttachment: true for a specific filename:

func TestGetDocumentArtifact_Attachment(t *testing.T) {
	gin.SetMode(gin.TestMode)
	// Create a custom fake that returns ForceAttachment=true
	fake := &fakeDocumentService{}
	// Override the method for this specific test, or extend fake to handle "attachment.txt"
	h := &DocumentHandler{
		documentService: fake,
	}
	c, w := setupGinContextWithUser("GET", "/api/v1/documents/artifact/attachment.txt", "")
	c.Params = gin.Params{{Key: "filename", Value: "attachment.txt"}}

	h.GetDocumentArtifact(c)

	if w.Code != http.StatusOK {
		t.Fatalf("expected 200, got %d", w.Code)
	}
	if contentDisp := w.Header().Get("Content-Disposition"); !strings.HasPrefix(contentDisp, "attachment;") {
		t.Fatalf("expected attachment disposition, got: %s", contentDisp)
	}
}

Add a test for empty filename:

func TestGetDocumentArtifact_EmptyFilename(t *testing.T) {
	gin.SetMode(gin.TestMode)
	h := &DocumentHandler{
		documentService: &fakeDocumentService{},
	}
	c, w := setupGinContextWithUser("GET", "/api/v1/documents/artifact/", "")
	c.Params = gin.Params{{Key: "filename", Value: ""}}

	h.GetDocumentArtifact(c)

	if w.Code != http.StatusOK {
		t.Fatalf("expected 200, got %d", w.Code)
	}
	var resp map[string]interface{}
	json.Unmarshal(w.Body.Bytes(), &resp)
	if resp["code"] == float64(common.CodeSuccess) {
		t.Fatal("expected error for empty filename")
	}
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/handler/document_test.go` around lines 480 - 539, Update tests for
GetDocumentArtifact on DocumentHandler to assert Content-Disposition behavior
and validate filename params: extend or modify fakeDocumentService to return
ForceAttachment=true for a specific filename (e.g., "attachment.txt") and add a
TestGetDocumentArtifact_Attachment that calls
DocumentHandler.GetDocumentArtifact and asserts
w.Header().Get("Content-Disposition") starts with "attachment;"; also enhance
the success test (TestGetDocumentArtifact_Success) to assert Content-Disposition
equals `inline; filename="safe.txt"` when ForceAttachment is false (use the fake
to return that behavior), and add a TestGetDocumentArtifact_EmptyFilename that
calls GetDocumentArtifact with an empty gin Param filename and asserts the
response is an error (non-success code) instead of CodeSuccess; reference
DocumentHandler, GetDocumentArtifact, and fakeDocumentService to locate the code
to change.

541-621: ⚡ Quick win

Add tests for Content-Disposition headers and dataset mismatch.

The test coverage is missing:

  1. Validation of Content-Disposition headers for both GetDocumentPreview and DownloadDocument. The PR objectives mention these endpoints set appropriate headers.
  2. For DownloadDocument, a critical test case for dataset mismatch: when a document exists but belongs to a different dataset than the one specified in the URL. According to the PR objectives, the service "enforce[s] strict dataset alignment and access checks."
  3. Parameter validation edge cases (empty docID, empty datasetID).
📋 Suggested test additions

Add Content-Disposition validation to the success tests:

 func TestGetDocumentPreview_Success(t *testing.T) {
 	// ... existing setup ...
 	h.GetDocumentPreview(c)
 
 	if w.Code != http.StatusOK {
 		t.Fatalf("expected 200, got %d", w.Code)
 	}
 	if w.Header().Get("Content-Type") != "text/plain" {
 		t.Fatalf("unexpected content type: %s", w.Header().Get("Content-Type"))
 	}
+	if contentDisp := w.Header().Get("Content-Disposition"); contentDisp != "inline; filename=\"preview.txt\"" {
+		t.Fatalf("unexpected Content-Disposition: %s", contentDisp)
+	}
 	if w.Body.String() != "preview content" {
 		t.Fatalf("unexpected body: %s", w.Body.String())
 	}
 }

Add a dataset mismatch test for DownloadDocument:

func TestDownloadDocument_DatasetMismatch(t *testing.T) {
	gin.SetMode(gin.TestMode)
	// Extend fakeDocumentService to return an error for dataset mismatch
	// or use a specific combination that triggers this error
	fake := &fakeDocumentService{}
	h := &DocumentHandler{
		documentService: fake,
	}
	c, w := setupGinContextWithUser("GET", "/api/v1/datasets/wrong-dataset/documents/doc-1", "")
	c.Params = gin.Params{{Key: "dataset_id", Value: "wrong-dataset"}, {Key: "document_id", Value: "doc-1"}}

	h.DownloadDocument(c)

	if w.Code != http.StatusOK {
		t.Fatalf("expected 200, got %d", w.Code)
	}
	var resp map[string]interface{}
	json.Unmarshal(w.Body.Bytes(), &resp)
	// Verify appropriate error code for dataset mismatch
	if resp["code"] == float64(common.CodeSuccess) {
		t.Fatal("expected error for dataset mismatch")
	}
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/handler/document_test.go` around lines 541 - 621, Add assertions for
Content-Disposition headers in TestGetDocumentPreview_Success (verify the header
includes an appropriate filename and disposition like "inline; filename=...")
and in TestDownloadDocument_Success (verify "attachment; filename=..."), extend
fakeDocumentService or add a specific behavior in the fake used by
DownloadDocument tests so that requesting doc-1 with a different dataset_id
returns a dataset-mismatch error and write a new
TestDownloadDocument_DatasetMismatch that calls DownloadDocument with dataset_id
"wrong-dataset" and asserts the response error code is not common.CodeSuccess,
and finally add lightweight parameter validation tests calling
GetDocumentPreview and DownloadDocument with empty docID and/or datasetID (use
setupGinContextWithUser and setting c.Params to empty "") expecting the handler
to return the appropriate error code; reference GetDocumentPreview,
DownloadDocument, fakeDocumentService, setupGinContextWithUser in your changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/handler/document.go`:
- Around line 480-482: The Content-Disposition header is using res.FileName
directly, risking header injection; update the download handler to sanitize the
filename before use by calling the same sanitizer used elsewhere
(sanitizeArtifactFilename) or by using the already-sanitized
artifact.SafeFilename pattern like in GetDocumentArtifact; replace
fmt.Sprintf(`attachment; filename="%s"`, res.FileName) with a safe value (e.g.,
safeName := sanitizeArtifactFilename(res.FileName) and use safeName) so
quotes/newlines are removed/escaped before setting the header and keep
c.Data(...) unchanged.

---

Nitpick comments:
In `@internal/handler/document_test.go`:
- Around line 480-539: Update tests for GetDocumentArtifact on DocumentHandler
to assert Content-Disposition behavior and validate filename params: extend or
modify fakeDocumentService to return ForceAttachment=true for a specific
filename (e.g., "attachment.txt") and add a TestGetDocumentArtifact_Attachment
that calls DocumentHandler.GetDocumentArtifact and asserts
w.Header().Get("Content-Disposition") starts with "attachment;"; also enhance
the success test (TestGetDocumentArtifact_Success) to assert Content-Disposition
equals `inline; filename="safe.txt"` when ForceAttachment is false (use the fake
to return that behavior), and add a TestGetDocumentArtifact_EmptyFilename that
calls GetDocumentArtifact with an empty gin Param filename and asserts the
response is an error (non-success code) instead of CodeSuccess; reference
DocumentHandler, GetDocumentArtifact, and fakeDocumentService to locate the code
to change.
- Around line 541-621: Add assertions for Content-Disposition headers in
TestGetDocumentPreview_Success (verify the header includes an appropriate
filename and disposition like "inline; filename=...") and in
TestDownloadDocument_Success (verify "attachment; filename=..."), extend
fakeDocumentService or add a specific behavior in the fake used by
DownloadDocument tests so that requesting doc-1 with a different dataset_id
returns a dataset-mismatch error and write a new
TestDownloadDocument_DatasetMismatch that calls DownloadDocument with dataset_id
"wrong-dataset" and asserts the response error code is not common.CodeSuccess,
and finally add lightweight parameter validation tests calling
GetDocumentPreview and DownloadDocument with empty docID and/or datasetID (use
setupGinContextWithUser and setting c.Params to empty "") expecting the handler
to return the appropriate error code; reference GetDocumentPreview,
DownloadDocument, fakeDocumentService, setupGinContextWithUser in your changes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: dcee25d0-5931-43f3-9e66-7334ed7d9b90

📥 Commits

Reviewing files that changed from the base of the PR and between ee32d91 and 920feab.

📒 Files selected for processing (5)
  • internal/handler/document.go
  • internal/handler/document_test.go
  • internal/router/router.go
  • internal/service/document.go
  • internal/service/document_test.go

Comment on lines +480 to +482
c.Header("Content-Type", res.ContentType)
c.Header("Content-Disposition", fmt.Sprintf(`attachment; filename="%s"`, res.FileName))
c.Data(http.StatusOK, res.ContentType, res.Data)
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Jun 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Sanitize filename in Content-Disposition header to prevent header injection.

res.FileName originates from doc.Name and is used directly in the header without sanitization. If the document name contains characters like ", \r, or \n, this could result in malformed headers or header injection.

The GetDocumentArtifact handler correctly uses artifact.SafeFilename (which is sanitized via sanitizeArtifactFilename). Apply similar sanitization here for consistency and safety.

Proposed fix
+func sanitizeDownloadFilename(filename string) string {
+	// Remove or replace characters that could cause issues in Content-Disposition headers
+	replacer := strings.NewReplacer(`"`, `_`, "\r", "", "\n", "", "\\", "_")
+	return replacer.Replace(filename)
+}
+
 func (h *DocumentHandler) DownloadDocument(c *gin.Context) {
 	// ... existing code ...

 	c.Header("Content-Type", res.ContentType)
-	c.Header("Content-Disposition", fmt.Sprintf(`attachment; filename="%s"`, res.FileName))
+	c.Header("Content-Disposition", fmt.Sprintf(`attachment; filename="%s"`, sanitizeDownloadFilename(res.FileName)))
 	c.Data(http.StatusOK, res.ContentType, res.Data)
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/handler/document.go` around lines 480 - 482, The Content-Disposition
header is using res.FileName directly, risking header injection; update the
download handler to sanitize the filename before use by calling the same
sanitizer used elsewhere (sanitizeArtifactFilename) or by using the
already-sanitized artifact.SafeFilename pattern like in GetDocumentArtifact;
replace fmt.Sprintf(`attachment; filename="%s"`, res.FileName) with a safe value
(e.g., safeName := sanitizeArtifactFilename(res.FileName) and use safeName) so
quotes/newlines are removed/escaped before setting the header and keep
c.Data(...) unchanged.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Hz-186 pls check this.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 5, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.16%. Comparing base (ee32d91) to head (920feab).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main   #15688   +/-   ##
=======================================
  Coverage   93.16%   93.16%           
=======================================
  Files          10       10           
  Lines         717      717           
  Branches      118      118           
=======================================
  Hits          668      668           
  Misses         29       29           
  Partials       20       20           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci Continue Integration size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants